Speaker clustering of speech utterances using a voice characteristic reference space
نویسندگان
چکیده
This paper presents an effective technique for clustering speech utterances based on their associated speaker. In attempts to determine which utterances are from the same speakers, a prerequisite is to measure the similarity of voice characteristics between utterances. Since the vast majority of existing methods evaluate the inter-utterance similarity by taking only the information from the spectrum-based features of utterance pairs into account, the resulting clusters may not be well relevant to speaker, but instead likely to the environmental conditions or other acoustic classes. To compensate for this shortcoming, this study proposes to project utterances from their spectrum-based feature representation onto a reference space trained to cover the generic voice characteristics inherently in all of the utterances to be clustered. The resultant projection vectors naturally reflect the relationships between all the utterances and are more robust against the interference from non-speaker factors. We exemplarily present three distinct implementations for reference space creation.
منابع مشابه
Community detection with manifold learning on speaker i-vector space for Chinese
Speaker recognition with clustering speech signals of the same speaker is an important speech analysis task in various applications. Recent works have shown that there was an underlying manifold on which speaker utterances live in the model-parameter space. However, most speaker clustering methods work on the Euclidean space, and hence often fail to discover the intrinsic geometrical structure ...
متن کاملClustering speakers by their voices
The problem of clustering speakers by their voices is addressed. With the mushrooming of available speech data from television broadcasts to voice mail, automatic systems for archive retrieval, organizing and labeling by speaker are necessary. Clustering conversations by speaker is a solution to all three of the above tasks. Another application for speaker clustering is to group utterances toge...
متن کاملSpeech recognition using voice-characteristic-dependent acoustic models
This paper proposes a speech recognition technique based on acoustic models considering voice characteristic variations. Context-dependent acoustic models, which are typically triphone HMMs, are often used in continuous speech recognition systems. This work hypothesizes that the speaker voice characteristics that humans can perceive by listening are also factors in acoustic variation for constr...
متن کاملSpeaker clustering of unknown utterances based on maximum purity estimation
This paper addresses the problem of automatically grouping unknown speech utterances that are from the same speaker. A clustering method based on maximum purity estimation is proposed, with the aim of maximizing the similarities of voice characteristics between utterances within all the clusters. This method employs a genetic algorithm to determine the cluster where each utterance should be loc...
متن کاملA user-configurable system for voice label recognition
A set of techniques for con guring a speech recognition system to a particular user are described in the context of voice label recognition over the public switched telephone network. User-con gurable vocabularies are provided through automatic acoustic baseform determination based on an inventory of speaker independent subword acoustic units. The tendency of input utterances to contain out-ofv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004